Allele detection using primer extension with sequence-coded identity tags

ABSTRACT

A method for determining the genotype of one or more individuals at a polymorphic locus employs amplification of a region of DNA, labeling of allele-specific extension primers containing tags, and hybridization of the products to an array of probes. The genotype is identified from the pattern of hybridization. The method can also be used to determine the frequency of different alleles in a population.

FIELD OF THE INVENTION

[0001] The invention is related to the area of genome analysis. Inparticular it is related to the field of identification of bases atparticular locations in a nucleic acid molecule.

BACKGROUND OF THE INVENTION

[0002] Obtaining genotype information on thousands of polymorphisms in ahighly parallel fashion is becoming an increasingly important task inmapping disease loci, in identifying quantitative trait loci, indiagnosing tumor loss of heterozygosity, and in performing associationstudies. A currently available method for simultaneously evaluatinglarge numbers of genetic polymorphisms involves hybridization toallele-specific probes on high density oligonucleotide arrays. In orderto practice that method, redundant sets of hybridization probes,typically twenty or more, are used to score each allelic marker. A highdegree of redundancy is required to reduce noise and achieve anacceptable level of accuracy. Even this level of redundancy isinsufficient to unambiguously score heterozygotes or to quantitativelydetermine allele frequency in a population.

[0003] The technique of allele-specific polymerase chain reaction(ASPCR) can be applied to allele identification and quantitativeanalysis of allele frequency. However, this technique suffers from crossreactivity between amplified products when hybridizing to probes whichdiffer by only a single nucleotide base. A partial solution to thecross-reactivity problem has been achieved by the addition of sequencetags to the ASPCR primers. The incorporation of tags in ASPCR primerscan itself interfere with the identification of the amplificationproducts because unreacted primers or partially extended products cancompete with full products for hybridization to the probes. Thus, thereis a further need in the art for methods and materials which permit theaccurate determination of polymorphic loci without interference fromincompletely reacted products.

SUMMARY OF THE INVENTION

[0004] It is an object of the invention to provide methods andcompositions for the identification of nucleotides at a polymorphiclocus in a nucleic acid sequence. This and other objects of theinvention are provided by one or more of the embodiments describedbelow.

[0005] In one embodiment of the invention, a method is provided to aidin detecting a selected allele of a gene in a sample. A region of singleor double stranded DNA in the sample is amplified using one or a pair ofamplification primers to form an amplified DNA product. The regioncomprises a polymorphic locus of the selected allele of the gene. Anextension primer is labeled in the presence of the amplified DNAproduct, which serves as the template for the labeling reaction. Theextension primer comprises a 3′ portion which is complementary to theamplified DNA product and a 5′ portion which is not complementary to theamplified DNA product. The extension primer also terminates in a 3′nucleotide at the polymorphic locus of the selected allele. At least onelabeled nucleotide is coupled to the 3′ terminal nucleotide of theextension primer to form a labeled extension primer. The labeledextension primer is hybridized to a probe on a solid support. All or aportion of the probe is complementary to the 5′ portion of the extensionprimer.

[0006] Another embodiment of the invention provides another method toaid in detecting a selected allele of a gene in a sample. A region ofsingle or double stranded DNA in the sample is specifically amplifiedusing one or a pair of amplification primers to form an amplified DNAproduct. The region comprises a polymorphic locus of the selected alleleof the gene. An amplification primer terminates in a 3′ nucleotide atthe polymorphic locus of the selected allele. An extension primer islabeled in the presence of the amplified DNA product, which serves asthe template for the labeling reaction. The extension primer comprises a3′ portion which is complementary to the amplified DNA product and a 5′portion which is not complementary to the amplified DNA product. Theextension primer also terminates in a 3′ nucleotide at the polymorphiclocus of the selected allele. At least one labeled nucleotide is coupledto the 3′ terminal nucleotide of the extension primer to form a labeledextension primer. The labeled extension primer is hybridized to a probeon a solid support. All or a portion of the probe is complementary tothe 5′ portion of the extension primer.

[0007] Yet another embodiment of the invention is a kit which comprisesin a single container a set of primers for use in detecting a selectedallele of a gene. The set of primers includes a pair of primers whichamplify a region of the gene comprising a polymorphic locus and anextension primer which terminates in a 3′ nucleotide which is thepolymorphic locus of the selected allele. A 3′ portion of the extensionprimer is complementary to the selected allele, and a 5′ portion of theextension primer is complementary to all or a portion of a probe on asolid support but not complementary to the amplified region of the gene.

[0008] Still another embodiment of the invention is a kit whichcomprises in a single container a set of primers for use in detecting anallele. The set of primers includes a pair of primers which specificallyamplify a selected allele and an extension primer. The pair of primerscomprises a first and a second primer. The first and second primers arecomplementary to opposite strands of a DNA target. The first primer andthe extension primer each terminate in a 3′ nucleotide which is apolymorphic locus of the selected allele. A 3′ portion of the extensionprimer is complementary to the selected allele, and a 5′ portion of theextension primer is complementary to all or a portion of a probe on asolid support but not complementary to the amplified region of the DNAtarget.

[0009] Still another embodiment of the invention provides another methodto aid in detecting a selected allele of a gene in a sample. A region ofsingle or double stranded DNA in the sample comprises a polymorphiclocus of the selected allele of the gene. An extension primer is labeledin the presence of the region of DNA which serves as the template forthe labeling reaction. The extension primer comprises a 3′ portion whichis complementary to the region of DNA and a 5′ portion which is notcomplementary to the region of DNA. The extension primer also terminatesin a 3′ nucleotide at the polymorphic locus of the selected allele. Atleast one labeled nucleotide is coupled to the 3′ terminal nucleotide ofthe extension primer to form a labeled extension primer. The labeledextension primer is hybridized to a probe on a solid support. All or aportion of the probe is complementary to the 5′ portion of the extensionprimer.

[0010] The invention thus provides the art with sensitive and specificmethods and compositions for identification of polymorphic nucleotidesin a DNA sample which may be from one or more individuals.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011]FIG. 1 illustrates a method of determining nucleotides at apolymorphic locus. The first step shows the use of allele specificpolymerase chain reaction (ASPCR) primers to amplify only those regionsof the double stranded DNA sample which contain a specific nucleotide ata polymorphic locus. In the second step, the amplification productserves as the template for a primer extension reaction. The primercontains a tag at its 5′ end and terminates in a 3′ nucleotide at thepolymorphic locus. The final step shown is the hybridization of thelabeled extension product to a solid support to which a probe isattached that is complementary to the tag at the 5′ end of the extensionprimer.

[0012]FIG. 2 illustrates another method of determining nucleotides at apolymorphic locus. The first step involves the use of polymerase chainreaction (without allele specificity) to amplify a region of the doublestranded DNA sample which contains a specific polymorphic locus. In thesecond step, the amplification product serves as the template for aprimer extension reaction. The primer contains a tag at its 5′ end andterminates in a 3′ nucleotide at the polymorphic locus. The final stepshown is the hybridization of the labeled extension product to a solidsupport to which a probe is attached that is complementary to the tag atthe 5′ end of the extension primer.

DETAILED DESCRIPTION OF THE INVENTION

[0013] It is a discovery of the present inventors that determination ofa base at a polymorphic locus can be accomplished with great specificityand sensitivity by incorporating unique tags into allele-specificnucleic acids and hybridizing them to tag-specific probes on a solidsupport. A nucleic acid sample is optionally amplified in a manner whichis either allele specific or not allele specific. The amplificationproducts can serve as the template for a primer extension reaction usinguniquely tagged, allele-specific primers. A labeled extension product isformed for each primer only if the respective allele was present in theoriginal nucleic acid sample. Extension products corresponding todifferent alleles are linked to different tags. Each tag comprises asequence that is complementary to all or part of a corresponding probeat a known location on a detection array. The use of a unique tag foreach allele eliminates the problem of cross-hybridization which ariseswith other methods. Furthermore, the use of tags eliminates interferencefrom unreacted amplification primers and partially extended products.Such problems can prevent unambiguous determination of polymorphicalleles.

[0014] A diploid organism, for example a human, possesses two copies ofeach type of autosomal gene in its somatic cells. A population oforganisms may contain several variants of a gene, known as alleles. A“polymorphic locus” is a location within a genome which exhibits geneticpolymorphism, i.e., a location where one or more nucleotides may vary inthe genomes of different individuals. Such variations can arise due toinherited mutations, or they can arise as de novo mutations in anindividual organism. An “allelic form” is a specific variant of a geneembodied in a nucleic acid molecule, e.g., genomic DNA, an RNAtranscript, a cDNA, a synthetic nucleic acid bearing the sequence of thevariant, or a protein molecule encoded by the variant. Different allelicforms differ from one another by single basepair substitutions (alsocalled single nucleotide polymorphisms or SNPs), or they can differ bytwo or more bases. Different allelic forms can also arise by insertionor deletion mutations. Any known allelic form at a polymorphic locus canbe identified and quantified with the methods described here.

[0015] Briefly, two steps can be employed to determine a polymorphicnucleotide: labeling and hybridizing. An optional amplification step canpreced the labeling and can be either allele-specific or notallele-specific. Allele-specific amplification of a nucleic acid sampleaccording to the present invention uses at least one allele-specificprimer; the primer has an allele-specific 3′ end. For amplificationwithout allele specificity, the primers lack an allele-specific 3′ end.The sample nucleic acid or amplification products are used as templateswith an extension primer to add one or more nucleotides, preferablylabeled, to the extension primers. Each extension primer contains a tagsequence which is complementary to all or part of a probe in an array ona solid support. Each extension primer also has an allele-specific 3′end. The labeled extension products are hybridized to probes on a solidsupport. An additional optional step involves the optical detection offluorescently labeled, hybridized amplification products.

[0016] The genotype of an individual at a polymorphic locus can bedetermined from the hybridization. If the nucleic acid sample beingtested is derived from a population or group of individual organisms, anallele frequency or the ratio of allelic forms in the population can bequantified. A plurality of polymorphic loci in a given nucleic acidsample can be simultaneously analyzed in a single reaction mixture usinga plurality of pairs of primers and/or a plurality of extension primers.Alternatively, individually labeled extension primers can be mixed andhybridized on a single solid support.

[0017] A “tag” or “sequence tag” is a nucleotide sequence which iscomplementary or nearly complementary to the sequence of all or aportion of a probe in an array. A tag sequence need only be sufficientlycomplementary to its respective probe sequence to permit specificbinding between them, without sufficient binding to other probes toconfuse the assignment of tag to probe. However, it is preferred thateach base of a tag sequence be complementary to each corresponding baseof the respective probe sequence. Tags and probes need not be identicalin length. The appropriate length for tags and probes is such that a tagand its respective probe hybridize with high affinity and highspecificity. Important factors include, for example, length of the tagand probe sequences, the number and position of mismatching bases, ifany, and the characteristics of the solution in which hybridization iscarried out, especially its ionic strength and pH. Each probe and itscorresponding tag share a complementary region that preferably lacks anymismatched bases and is at least 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,35, or 40 nucleotides in length.

[0018] The sequence tags are typically unrelated to the sequences of thepolymorphic alleles which are being analyzed. The sequence tags arechosen for their favorable hybridization characteristics. The tags aretypically selected so that they have similar hybridizationcharacteristics to each other and minimal cross-hybridization to othertag sequences. Each sequence tag is attached to an extension primer fora particular allele, and serves as a label or address for thatparticular allele.

[0019] A generic solid support, corresponding to the pre-selected tagsequences can be fabricated and used to detect the presence, absence, orratio of specific allelic forms in a test sample. See U.S. Pat. No.5,800,992, application Ser. No. 08/626,285 filed Apr. 4, 1996, and EPapplication no. 97302313.8 which are expressly incorporated by referenceherein.

[0020] The DNA in the sample analyzed can be of any source, includinggenomic, nuclear, cDNA, mitochondrial DNA, macronuclear DNA, andmicronuclear DNA. The DNA can be isolated from one or more individuals.The DNA can be purified to contain only a certain subset of cellularDNA, if desired. Any type of amplification reaction can be used,including PCR, ligase chain reaction, transcription amplification, andself-sustained sequence replication. Thus, appropriate enzymes such asDNA polymerase or DNA ligase will be used as desired by the artisan.

[0021] Each amplification primer or pair of amplification primersamplifies a region of DNA containing a polymorphic locus. Pairs ofprimers can comprise a first primer and a second primer. The first andsecond primers can be complementary to opposite strands of the DNAregion to be amplified. If the amplification step is to be allelespecific, the first primer of the pair terminates in a 3′ nucleotidewhich is complementary to a specific allelic form but not complementaryto other allelic forms. If the amplification step is not to be allelespecific, then the first primer terminates at its 3′ end, 5′ to thepolymorphic locus.

[0022] In an alternative embodiment the amplification step can beomitted. Thus, if sufficient DNA is available, the primer extensionreaction can be performed directly on sample DNA. In another alternativeembodiment, amplification of the entire population of sample DNA can beperformed using random primers.

[0023] The amplified DNA product or sample nucleic acid is labeled usinga template-dependent primer extension reaction prior to itshybridization to a probe on a solid support. Any such reactions known inthe art can be used, including but not limited to a single baseextension reaction using a DNA polymerase. The extension primer isallele-specific and terminates at its 3′ end in the polymorphic locus.The extension primer contains a portion at its 3′ end which iscomplementary to the amplified DNA product. The extension primer alsocontains at its 5′ end a portion which comprises a tag. The nucleotidesequence of the tag is complementary to all or a portion of a probe on asolid support. The tag and corresponding probe sequences arespecifically chosen so as not to share a complementary region with theregion of DNA which is amplified; this prevents cross-hybridization ofother labeled amplified products with the probe. Preferably, no probe onthe solid support shares a region of complementary sequence with anamplified DNA region greater than 2, 3, 4, 5, 6, 8, or 10 consecutivebases.

[0024] The labeled extension product can be hybridized to one or moreprobes which are immobilized to known locations on a solid support,e.g., in an array, microarray, high density array, beads, or microtiterdish. Each probe is of opposite complementarity as a corresponding tagon an extension primer. The quantities of the label at known locationson the solid support can be compared, and the genotype can be determinedfor an individual or the allele frequency can be determined for apopulation from whom the DNA in the sample was obtained.

[0025] The reactions of the present invention can be performed in asingle or multiplex format. For example, the amplification step can beperformed using up to 20, 30, 40, 50, 75, 100, 150, 200, 250, or 300different primer pairs to amplify a corresponding number of polymorphicmarkers. These can be pooled for the primer extension reaction, ifdesired. Pooling for the hybridization step is desirable so thatthousands of hybridizations can be performed simultaneously. The resultscan be expressed qualitatively (presence or absence of given nucleotidesat each polymorphic locus in a DNA sample) or quantitatively (ratio ofdifferent nucleotides at each polymorphic locus).

[0026] The ability to perform the method of the present invention in amultiplex manner for a number of different polymorphic locisimultaneously is due to the sequence tags which are present on theextension primers at their 5′ ends. The sequence tags permit theoperator to ultimately sort the products of multiplex amplification andmultiplex primer extension to different locations on an array. Eachsequence tag on an extension primer is used for a single allele.

[0027] Sets of primers according to the present invention comprise anamplification pair and one or more extension primers. These may bepackaged in a single container, preferably a divided container orpackage. The pair of primers amplifies a region of double stranded DNAwhich comprises a polymorphic locus. The extension primer has twoportions, a 3′ portion which is complementary to a portion of the regionof double stranded DNA which contains the polymorphic locus and a 5′portion which is not complementary to the region of double stranded DNA.The 5′ region is the tag sequence which is complementary to the tagarray which is used to sort and analyze the products of the primerextension reaction. The 3′ end of the extension primer terminates at thepolymorphic locus.

[0028] So long as the components are physically attached to each otheror in a single package they form a kit. Such kits can additionallyinclude a solid support comprising at least two probes, where each probecontains a different tag. Instructions for use according to thedisclosed method, enzymes for amplification, buffers and control samplescan be included as components in the kit.

[0029] Advantages of the disclosed method include that just one generictag solid support can be used to genotype any genetic marker, i.e., nospecific customized solid support is needed. In addition, thepre-selected probe sequences synthesized on the solid support guaranteegood hybridization results between the probe and the tag, with littleinterference from cross-hybridization by closely related allelicsequences.

[0030] Providing a Nucleic Acid Sample

[0031] The terms “nucleic acid” or “nucleic acid molecule” refer to adeoxyribonucleotide or ribonucleotide polymer in either single-ordouble-stranded form, and unless otherwise limited, would encompassanalogs of a natural nucleotide that can function in a similar manner asnaturally occurring nucleotide. Suitable nucleic acid samples cancontain polymorphic loci of interest. Suitable nucleic acid samples canalso contain nucleic acids derived from a polymorphic locus of interest.As used herein, a nucleic acid derived from a polymorphic locus refersto a nucleic acid for whose synthesis the genomic DNA containing thepolymorphic locus or a subsequence thereof has ultimately served as atemplate. Thus, a DNA amplified from genomic DNA, an RNA transcribedfrom the amplified DNA, an mRNA transcribed from the genomic DNA, or acDNA reverse transcribed from the mRNA, etc., are all derived from thepolymorphic locus, and detection of such derived products is indicativeof the presence and/or abundance of the original polymorphic locus in asample. Thus, suitable samples include, but are not limited to, isolatedgenomic DNA containing the gene or genes containing the polymorphiclocus, an RNA transcript derived from the isolated genomic DNA, cDNAreverse transcribed from the transcript, cRNA transcribed from the cDNA,DNA amplified from the genes, RNA transcribed from amplified DNA, andthe like. If the sample is a non-DNA sample, it can be converted todouble stranded DNA prior to amplification per the invention, forexample using reverse transcriptase and/or DNA polymerase. The samplecan be derived from a single individual organism, e.g., human, animal,plant, or microbial. The sample can alternatively be derived from two ormore organisms, in which case the determination will reveal informationabout allelic frequency within the population from which the nucleicacid sample was derived.

[0032] The nucleic acid sample can be a homogenate of cells or tissuesor other biological samples. Preferably, the nucleic acid sample is atotal DNA preparation of a biological sample. More preferably in someembodiments, the nucleic acid sample is the total genomic DNA isolatedfrom a biological sample. The nucleic acid sample can be the total mRNAisolated from a biological sample. Those of skill in the art willappreciate that the total mRNA prepared with most methods includes notonly the mature mRNA, but also the RNA processing intermediates andnascent pre-mRNA transcripts. For example, total mRNA purified with apoly (dT) column contains RNA molecules with poly (A) tails. ThosepolyA⁺ RNA molecules could be mature mRNA, RNA processing intermediates,nascent transcripts or degradation intermediates.

[0033] Biological samples can be of any biological tissue or fluid orcells from any organism. Frequently the sample will be a “clinicalsample,” which is a sample derived from a patient. Clinical samplesprovide a rich source of information regarding the various alleles of agene and their relation to disease. Some embodiments of the inventioncan be employed to detect mutations and to identify the phenotype ofmutations. Such embodiments have extensive applications in clinicaldiagnostics and clinical studies. Typical clinical samples include, butare not limited to, sputum, blood, blood cells (e.g., white cells),tissue or fine needle biopsy samples, urine, peritoneal fluid, andpleural fluid, or cells therefrom. Biological samples can also includesections of tissues, such as frozen sections or formalin-fixed sectionstaken for histological purposes. Cell cultures are another typicalsource of biological samples. Cell cultures used as a source of DNA orRNA can be derived from a clinical sample, or can be supplied from aprimary cell culture, a subculture, or a cell line from any organism.

[0034] Amplification

[0035] The nucleic acid sample can be subjected to amplification priorto hybridization and detection of an allelic marker. Methods foramplification of a nucleic acid are well known in the art. In general,amplification of a nucleic acid sample employs a pair of single-strandedoligonucleotide primers together with an enzyme, e.g., DNA polymerase,which replicates (amplifies) a region of the nucleic acid sample,resulting in multiple copies of the region delimited by the sequencesthat are complementary to the primers. The pair of primers is chosen soas to amplify a region of the nucleic acid sample containing thepolymorphic locus. The size of the region amplified is not critical, butthe region must be sufficiently large to include not only thepolymorphic locus but also enough sequence on either side of thepolymorphic locus to permit highly specific binding of the pair ofprimers to the chosen region. Strategies for designing and synthesizingprimers suitable for amplification of a specific region of a nucleicacid sample are known in the art. As is known in the art, each primer ofa pair of amplification primers hybridizes to, and is preferablycomplementary to, opposite strands of an allele. It is preferred thatthe primers hybridize to a double stranded nucleic acid in locationswhich are not more than 2 kb apart, and preferably which are much closertogether, such as not more than 1 kb, 0.5 kb, 0.2 kb, 0.1 kb, 0.01 kb or0.001 kb apart. A suitable DNA polymerase can be used as is known in theart. Thermostable polymerases are particularly convenient for thermalcycling of rounds of primer hybridization, polymerization, and melting.Amplification of single stranded nucleic acids can also be employed.

[0036] A preferred amplification method is allele-specificamplification. Okayama et al., J. Lab. Clin. Med. 114:105-113 (1989). Inallele-specific amplification, a nucleotide substitution which ischaracteristic of a given allele is placed at the 3′ end of one of theprimers. Only that allele which is complementary to the primer will beamplified; another allele, which contains a different nucleotidesubstitution and is not complementary to the 3′ end of the primer, willnot be amplified. The amplification reaction itself can be carried outaccording to the polymerase chain reaction (PCR) (see PCR Protocols, AGuide to Methods and Applications, Innis et al., Academic Press, Inc.N.Y., (1990)) or another suitable amplification method. Other suitableamplification methods include, but are not limited to ligase chainreaction (LCR) (see Wu and Wallace, Genomics, 4: 560 (1989); Landegren,et al., Science, 241: 1077 (1988); and Barringer, et al, Gene, 89: 117(1990)), transcription amplification (Kwoh, et al., Proc. Natl. Acad.Sci. USA, 86: 1173 (1989)), and self-sustained sequence replication(Guatelli, et al., Proc. Nat. Acad. Sci. USA, 87: 1874 (1990)).

[0037] One of skill in the art will appreciate that whateveramplification method is used, if a quantitative result is desired, caremust be taken to use a method that maintains or controls for therelative frequencies of the amplified nucleic acids to achievequantitative amplification. Methods of quantitative amplification arewell known to those of skill in the art. For example, quantitative PCRmay involve simultaneously co-amplifying a known quantity of a controlsequence using the same primers used to amplify the nucleic acids ofinterest. This provides an internal standard that can be used tocalibrate the PCR reaction. The high density array can then includeprobes specific to the internal standard for quantification of theamplified nucleic acid. Detailed protocols for quantitative PCR areprovided in PCR Protocols, A Guide to Methods and Applications, Innis etal., Academic Press, Inc. N.Y., (1990).

[0038] After the amplification it may be desirable to remove and/ordegrade any excess primers and nucleotides. This can be done by washingand/or enzymatic degradation, using such enzymes as endonuclease I andalkaline phosphatase, for example. Other techniques, such aschromatography, magnetic beads, and avidin- or streptavidin-conjugatedbeads, as are known in the art for accomplishing the removal can also beused. It is not necessary to remove or destroy one of two strands of anamplified DNA product.

[0039] Labeling the Extension Primer

[0040] The primer extension step of the method providesallele-specificity. The primer is designed to terminate at the positionof the polymorphic locus. The primer is hybridized to the denaturedamplified double stranded DNA. The primer can be extended by one or morelabeled nucleotides using, e.g., a mixture of nucleoside triphosphatesand a DNA polymerase. A variation of the primer extension reactioncalled the single base extension reaction can be used. In single-baseextension, dideoxynucleotides are used, which permit only the additionof a single nucleotide to the primer. Any DNA-dependent DNA polymerasecan be used. These include, but are not limited to, E. Coli DNApolymerase I, Klenow fragment of DNA polymerase I, T4 DNA polymerase, T7DNA polymerase, and T. aquaticus DNA polymerase. The extension reactionis preferably performed at the T_(M) of the primer with the template toenhance product formation.

[0041] One configuration for carrying out the primer extension steputilizes two different primers which each hybridize to opposite strandsof an amplified double stranded DNA. Each primer terminates at thepolymorphic locus. The primer extension reaction may be more robust withone strand as a template than the other. In addition, the informationobtained from the second strand should confirm the information obtainedfrom the first strand. The primers can bear the same or different 5′tags.

[0042] An alternative method for primer extension involves use ofreverse transcriptase and one or two primers which hybridize 3′ to thepolymorphic locus and terminate at the locus. This method may bedesirable in cases where “forward” direction primer extension is lessrobust than is desirable.

[0043] The nucleotides added by the primer extension reaction arelabeled. The label can be covalently attached to the nucleosidetriphosphates which serve as reactants for the extension reaction. Thelabel can be a fluorescent label (e.g., fluorescein, Texas red,rhodamine, green fluorescent protein, and the like) or other label asdefined under “Signal Detection” below.

[0044] Hybridizing Nucleic Acids to Arrays of Allele-specific Probes

[0045] “Hybridization” refers to the formation of a bimolecular complexof two different nucleic acids through complementary base pairing.Complementary base pairing occurs through non-covalent bonding, usuallyhydrogen bonding, of bases that specifically recognize other bases, asin the bonding of complementary bases in double-stranded DNA. In thisinvention, hybridization is carried out between a 5′ tag and at leastone probe which has been immobilized on a substrate to form an array.

[0046] One of skill in the art will appreciate that an enormous numberof array designs are suitable for the practice of this invention. Anarray will typically include a number of probes that specificallyhybridize to the sequences (tags) of interest. It is preferred that anarray include one or more control probes. In one embodiment, the arrayis a high density array. A high density array is an array used tohybridize with a target nucleic acid sample to detect the presence of alarge number of allelic markers, preferably more than 10, morepreferably more than 100, and most preferably more than 1000 allelicmarkers.

[0047] High density arrays are suitable for quantifying small variationsin the frequency of an allelic marker in the presence of a largepopulation of heterogeneous nucleic acids. Such high density arrays canbe fabricated either by de novo synthesis on a substrate or by spottingor transporting nucleic acid sequences onto specific locations of asubstrate. Both of these methods produce nucleic acids which areimmobilized on the array at particular locations. Nucleic acids can bepurified and/or isolated from biological materials, such as a bacterialplasmid containing a cloned segment of a sequence of interest. Suitablenucleic acids can also be produced by amplification of templates or bysynthesis. As a nonlimiting illustration, polymerase chain reaction,and/or in vitro transcription are suitable nucleic acid amplificationmethods.

[0048] Probe Design

[0049] The “probes” used here are specially designed to hybridize to acorresponding “tag”. Both the probe and tag sequences are speciallychosen, typically artificial oligonucleotide sequences that areunrelated to the rest of the target nucleic acid sequence (the amplifiedregion containing the polymorphism). Furthermore, the probe sequence ischosen so as to avoid or minimize cross-reactivity or hybridization withany portion of the target nucleic acid except the tag sequence.

[0050] Various types of probes can be included in an array. An arrayincludes “test probes.” Test probes can be oligonucleotides that rangefrom about 5 to about 45 or 5 to about 500 nucleotides, more preferablyfrom about 10 to about 40 nucleotides and most preferably from about 15to about 40 nucleotides in length. In particularly preferred embodimentsthe probes are 20 to 25 nucleotides in length. In another embodiment,test probes are double or single stranded DNA sequences. DNA sequencescan be isolated or cloned from natural sources or amplified from naturalsources using natural nucleic acids as templates. However, in situsynthesis of probes on the arrays is preferred. The probes havesequences complementary to particular tag sequences of the amplified DNAproduct which they are designed to detect. Thus, the test probes arecapable of specifically hybridizing to the tag portion of the targetnucleic acid they are designed to detect.

[0051] The term “perfect match probe” refers to a probe which has asequence that is perfectly complementary to a particular targetsequence. The probe is typically perfectly complementary to a portion(subsequence) of the target sequence. The perfect match probe can be a“test probe,” a “normalization control probe,” an expression levelcontrol probe and the like. A perfect match control or perfect matchprobe is, however, distinguished from a “mismatch control” or “mismatchprobe” or “mismatch control probe.”

[0052] In addition to test probes that bind the tags of interest, thehigh density array can contain a number of control probes. The controlprobes fall into two categories: normalization controls and mismatchcontrols.

[0053] Normalization controls are oligonucleotide or other nucleic acidprobes that are complementary to labeled reference oligonucleotides orother nucleic acid sequences that are added to the nucleic acid sample.The signals obtained from the normalization controls after hybridizationprovide a control for variations in hybridization conditions, labelintensity, “reading” efficiency, and other factors that may cause thesignal of a perfect hybridization to vary between arrays. In a preferredembodiment, signals (e.g., fluorescence intensity) read from all otherprobes in the array are divided by the signal (e.g., fluorescenceintensity) from the control probes, thereby normalizing themeasurements.

[0054] Virtually any probe can serve as a normalization control.However, it is recognized that hybridization efficiency varies with basecomposition and probe length. Preferred normalization probes areselected to reflect the average length of the other probes present inthe array; however, they can be selected to cover a range of lengths.The normalization control(s) can also be selected to reflect the(average) base composition of the other probes in the array; however ina preferred embodiment, only one or a few normalization probes are usedand they are selected such that they hybridize well (i.e. no secondarystructure) and do not match any target-specific probes.

[0055] Mismatch controls can also be provided for the probes to the tagsor for normalization controls. The terms “mismatch control” or “mismatchprobe” or “mismatch control probe” refer to a probe whose sequence isdeliberately selected not to be perfectly complementary to a particulartag. Mismatch controls are oligonucleotide probes or other nucleic acidprobes identical to their corresponding test or control probes exceptfor the presence of one or more mismatched bases. A mismatched base is abase selected so that it is not complementary to the corresponding basein the tag to which the probe would otherwise specifically hybridize.One or more mismatches are selected such that under appropriatehybridization conditions (e.g., stringent conditions) the test orcontrol probe would be expected to hybridize with its target sequence,but the mismatch probe would not hybridize (or would hybridize to asignificantly lesser extent). Preferred mismatch probes contain acentral mismatch. Thus, for example, where a probe is a 20 mer, acorresponding mismatch probe will have the identical sequence except fora single base mismatch (e.g., substituting a G, a C, or a T for an A) atany of positions 6 through 14 (the central mismatch).

[0056] For each mismatch control in a high-density array there typicallyexists a corresponding perfect match probe that is perfectlycomplementary to the same particular tag. The mismatch may comprise oneor more bases. While the mismatch(s) may be located anywhere in themismatch probe, terminal mismatches are less desirable, as a terminalmismatch is less likely to prevent hybridization of the tag. In aparticularly preferred embodiment, the mismatch is located at or nearthe center of the probe such that the mismatch is most likely todestabilize the duplex with the tag under the test hybridizationconditions.

[0057] Mismatch probes provide a control for non-specific binding orcross-hybridization to a nucleic acid in the sample other than the tagto which the probe is directed. Mismatch probes thus indicate whether ornot a hybridization is specific. For example, if the tag is present, theperfect match probes should be consistently brighter than the mismatchprobes. The difference in intensity between the perfect match and themismatch probe (I_((PM))−I_((MM))) provides a good measure of theconcentration of the hybridized material.

[0058] The array can also include sample preparation/amplificationcontrol probes. These are probes that are complementary to subsequencesof control genes selected because they do not normally occur in thenucleic acids of the particular biological sample being assayed.Suitable sample preparation/amplification control probes include, forexample, probes to bacterial genes (e.g., Bio B) where the sample inquestion is from a eukaryote.

[0059] In a preferred embodiment, oligonucleotide probes in the highdensity array are selected to bind specifically to the tags to whichthey are directed with minimal non-specific binding orcross-hybridization under the particular hybridization conditionsutilized. Because the high density arrays of this invention can containin excess of 1,000,000 different probes, it is possible to provide everyprobe of a characteristic length that binds to a particular nucleic acidsequence. Thus, for example, the high density array can contain everypossible 20-mer sequence complementary to an IL-2 mRNA. However, theremay exist 20-mer subsequences that are not unique to the IL-2 mRNA.Probes directed to these subsequences are expected to cross-hybridizewith occurrences of their complementary sequence in other regions of thesample genome. Similarly, other probes simply may not hybridizeeffectively under the hybridization conditions (e.g., due to secondarystructure, or interactions with the substrate or other probes). Thus, ina preferred embodiment, the probes that show such poor specificity orhybridization efficiency are identified and excluded either in the highdensity array itself (e.g., during fabrication of the array) or in thepost-hybridization data analysis.

[0060] Forming High Density Arrays

[0061] High density arrays are particularly useful for monitoring thepresence of allelic markers. The fabrication and application of highdensity arrays in gene expression monitoring have been disclosedpreviously in, for example, WO 97/10365, WO 92/10588, U.S. applicationSer. No. 08/772,376 filed Dec. 23, 1996; Ser. No. 08/529,115 filed onSep. 15, 1995; Ser. No. 08/168,904 filed Dec. 15, 1993; Ser. No.07/624,114 filed on Dec. 6, 1990, Ser. No. 07/362,901 filed Jun. 7,1990, all incorporated herein for all purposes by reference. In someembodiments using high density arrays, high density oligonucleotidearrays are synthesized using methods such as the Very Large ScaleImmobilized Polymer Synthesis (VLSIPS) disclosed in U.S. Pat. No.5,445,934 incorporated herein for all purposes by reference. Eacholigonucleotide occupies a known location on a substrate. A nucleic acidtarget sample is hybridized with a high density array ofoligonucleotides and then the amount of target nucleic acids hybridizedto each probe in the array is quantified.

[0062] Synthesized oligonucleotide arrays are particularly preferred forthis invention. Oligonucleotide arrays have numerous advantages overother methods, such as efficiency of production, reduced intra- andinter array variability, increased information content, and highsignal-to-noise ratio.

[0063] Preferred high density arrays comprise greater than about 100,preferably greater than about 1000, more preferably greater than about16,000, and most preferably greater than 65,000 or 250,000 or evengreater than about 1,000,000 different oligonucleotide probes,preferably in less than 1 cm² of surface area. The oligonucleotideprobes range from about 5 to about 50 or about 500 nucleotides, morepreferably from about 10 to about 40 nucleotides, and most preferablyfrom about 15 to about 40 nucleotides in length.

[0064] Methods of forming high density arrays of oligonucleotides,peptides and other polymer sequences with a minimal number of syntheticsteps are known. The oligonucleotide analogue array can be synthesizedon a solid substrate by a variety of methods, including, but not limitedto, light-directed chemical coupling and mechanically directed coupling.See Pirrung et al., U.S. Pat. No. 5,143,854 (see also PCT ApplicationNo. WO 90/15070) and Fodor et al., PCT Publication Nos. WO 92/10092 andWO 93/09668 and U.S. Ser. No. 07/980,523, which disclose methods offorming vast arrays of peptides, oligonucleotides and other moleculesusing, for example, light-directed synthesis techniques. See also, Fodoret al., Science, 251, 767-77 (1991). These procedures for synthesis ofpolymer arrays are now referred to as VLSIPSTM procedures. Using theVLSIPSTM approach, one heterogeneous array of polymers is converted,through simultaneous coupling at a number of reaction sites, into adifferent heterogeneous array. See, U.S. application Ser. Nos.07/796,243 and 07/980,523.

[0065] The development of VLSIPSTM technology as described in theabove-noted U.S. Pat. No. 5,143,854 and PCT patent publication Nos. WO90/15070 and 92/10092, is considered pioneering technology in the fieldsof combinatorial synthesis and screening of combinatorial libraries.More recently, patent application Ser. No. 08/082,937, filed Jun. 25,1993, describes methods for making arrays of oligonucleotide probes thatcan be used to check or determine a partial or complete sequence of atarget nucleic acid and to detect the presence of a nucleic acidcontaining a specific oligonucleotide sequence.

[0066] In brief, the light-directed combinatorial synthesis ofoligonucleotide arrays on a glass surface proceeds using automatedphosphoramidite chemistry and chip masking techniques. In one specificimplementation, a glass surface is derivatized with a silane reagentcontaining a functional group, e.g., a hydroxyl or amine group blockedby a photolabile protecting group. Photolysis through a photolithogaphicmask is used selectively to expose functional groups which are thenready to react with incoming 5′-photoprotected nucleosidephosphoramidites. The phosphoramidites react only with those sites whichare illuminated (and thus exposed by removal of the photolabile blockinggroup). Thus, the phosphoramidites only add to those areas selectivelyexposed from the preceding step. These steps are repeated until thedesired array of sequences have been synthesized on the solid surface.Combinatorial synthesis of different oligonucleotide analogues atdifferent locations on the array is determined by the pattern ofillumination during synthesis and the order of addition of couplingreagents.

[0067] In the event that an oligonucleotide analogue with a polyamidebackbone is used in the VLSIPSTM procedure, it is generallyinappropriate to use phosphoramidite chemistry to perform the syntheticsteps, since the monomers do not attach to one another via a phosphatelinkage. Instead, peptide synthetic methods are substituted. See, e.g.,Pirrung et al. U.S. Pat. No. 5,143,854. Peptide nucleic acids arecommercially available from, e.g., Biosearch, Inc. (Bedford, Mass.)which comprise a polyamide backbone and the bases found in naturallyoccurring nucleosides. Peptide nucleic acids are capable of binding tonucleic acids with high specificity, and are considered “oligonucleotideanalogues” for purposes of this disclosure.

[0068] Additional methods which can be used to generate an array ofoligonucleotides on a single substrate are described in co-pendingapplications Ser. No. 07/980,523, filed Nov. 20, 1992, and Ser. No.07/796,243, filed Nov. 22, 1991 and in PCT Publication No. WO 93/09668.In the methods disclosed in these applications, reagents are deliveredto the substrate by either (1) flowing within a channel defined onpredefined regions or (2) “spotting” on predefined regions or (3)through the use of photoresist. However, other approaches, as well ascombinations of spotting and flowing, can be employed. In each instance,certain activated regions of the substrate are mechanically separatedfrom other regions when the monomer solutions are delivered to thevarious reaction sites.

[0069] A typical “flow channel” method applied to the compounds andlibraries of the present invention can generally be described asfollows. Diverse polymer sequences are synthesized at selected regionsof a substrate or solid support by forming flow channels on a surface ofthe substrate through which appropriate reagents flow or in whichappropriate reagents are placed. For example, assume a monomer “A” is tobe bound to the substrate in a first group of selected regions. Ifnecessary, all or part of the surface of the substrate in all or a partof the selected regions is activated for binding by, for example,flowing appropriate reagents through all or some of the channels, or bywashing the entire substrate with appropriate reagents. After placementof a channel block on the surface of the substrate, a reagent having themonomer A flows through or is placed in all or some of the channel(s).The channels provide fluid contact to the first selected regions,thereby binding the monomer A on the substrate directly or indirectly(via a spacer) in the first selected regions.

[0070] Thereafter, a monomer “B” is coupled to second selected regions,some of which can be included among the first selected regions. Thesecond selected regions will be in fluid contact with a second flowchannel(s) through translation, rotation, or replacement of the channelblock on the surface of the substrate; through opening or closing aselected valve; or through deposition of a layer of chemical orphotoresist. If necessary, a step is performed for activating at leastthe second regions. Thereafter, the monomer B is flowed through orplaced in the second flow channel(s), binding monomer B at the secondselected locations. In this particular example, the resulting sequencesbound to the substrate at this stage of processing will be, for example,A, B, and AB. The process is repeated to form a vast array of sequencesof desired length at known locations on the substrate.

[0071] After the substrate is activated, monomer A can be flowed throughsome of the channels, monomer B can be flowed through other channels, amonomer C can be flowed through still other channels, etc. In thismanner, many or all of the reaction regions are reacted with a monomerbefore the channel block must be moved or the substrate must be washedand/or reactivated. By making use of many or all of the availablereaction regions simultaneously, the number of washing and activationsteps can be minimized.

[0072] One of skill in the art will recognize that there are alternativemethods of forming channels or otherwise protecting a portion of thesurface of the substrate. For example, according to some embodiments, aprotective coating such as a hydrophilic or hydrophobic coating(depending upon the nature of the solvent) is utilized over portions ofthe substrate to be protected, sometimes in combination with materialsthat facilitate wetting by the reactant solution in other regions. Inthis manner, the flowing solutions are further prevented from passingoutside of their designated flow paths.

[0073] High density nucleic acid arrays can be fabricated by depositingpresynthezied or natural nucleic acids in predetermined positions.Synthesized or natural nucleic acids are deposited on specific locationsof a substrate by light directed targeting and oligonucleotide directedtargeting. Nucleic acids can also be directed to specific locations inmuch the same manner as the flow channel methods. For example, a nucleicacid A can be delivered to and coupled with a first group of reactionregions which have been appropriately activated. Thereafter, a nucleicacid B can be delivered to and reacted with a second group of activatedreaction regions. Nucleic acids are deposited in selected regions.Another embodiment uses a dispenser that moves from region to region todeposit nucleic acids in specific spots. Typical dispensers include amicropipette or capillary pin to deliver nucleic acid to the substrateand a robotic system to control the position of the micropipette withrespect to the substrate. In other embodiments, the dispenser includes aseries of tubes, a manifold, an array of pipettes or capillary pins, orthe like so that various reagents can be delivered to the reactionregions simultaneously.

[0074] Hybridization Conditions

[0075] The term “stringent conditions” refers to conditions under whicha probe will hybridize to its tag subsequence, but with onlyinsubstantial hybridization to other sequences or to other sequencessuch that the difference may be identified. Stringent conditions aresequence-dependent and will be different in different circumstances.Longer sequences hybridize specifically at higher temperatures.Generally, stringent conditions are selected to be about 5° C. lowerthan the thermal melting point (T_(m)) for the specific sequence at adefined ionic strength and pH.

[0076] The T_(m) is the temperature, under defined ionic strength, pH,and nucleic acid concentration, at which 50% of the probes complementaryto the target sequence hybridize to the target sequence at equilibrium.As the target sequences are generally present in excess, at T_(m), 50%of the probes are occupied at equilibrium). Typically, stringentconditions will be those in which the salt concentration is at leastabout 0.01 to 1.0 M concentration of a Na or other salt at pH 7.0 to 8.3and the temperature is at least about 30° C. for short probes (e.g., 10to 50 nucleotides). Stringent conditions can also be achieved with theaddition of destabilizing agents such as formamide.

[0077] The phrase “hybridizing specifically to” refers to the binding,duplexing, or hybridizing of a molecule substantially to or only to aparticular nucleotide sequence or sequences under stringent conditionswhen that sequence is present in a complex mixture (e.g., totalcellular) of DNA or RNA. It is generally recognized that nucleic acidsare denatured by increasing the temperature or decreasing the saltconcentration of the buffer containing the nucleic acids. Under lowstringency conditions (e.g., low temperature and/or high salt) hybridduplexes (e.g., DNA:DNA, RNA:RNA, or RNA:DNA) will form even where theannealed sequences are not perfectly complementary. Thus, specificity ofhybridization is reduced at lower stringency. Conversely, at higherstringency (e.g., higher temperature or lower salt) successfulhybridization requires fewer mismatches.

[0078] One of skill in the art will appreciate that hybridizationconditions can be selected to provide any degree of stringency. In apreferred embodiment, hybridization is performed at low stringency, inthis case in 6× SSPE-T at 37° C. (0.005% Triton X-100), to ensurehybridization, and then subsequent washes are performed at higherstringency (e.g., 1× SSPE-T at 37° C.) to eliminate mismatched hybridduplexes. Successive washes can be performed at increasingly higherstringency (e.g., down to as low as 0.25× SSPE-T at 37° C. to 50° C.)until a desired level of hybridization specificity is obtained.Stringency can also be increased by addition of agents such asformamide. Hybridization specificity can be evaluated by comparison ofhybridization to the test probes with hybridization to the variouscontrols that can be present (e.g., expression level control,normalization control, mismatch controls, etc.).

[0079] In general, there is a tradeoff between hybridization specificity(stringency) and signal intensity. Thus, in a preferred embodiment, thewash is performed at the highest stringency that produces consistentresults and that provides a signal intensity greater than approximately10% of the background intensity. Thus, in a preferred embodiment, thehybridized array can be washed at successively higher stringencysolutions and read between each wash. Analysis of the data sets thusproduced will reveal a wash stringency above which the hybridizationpattern is not appreciably altered and which provides adequate signalfor the particular oligonucleotide probes of interest.

[0080] The stability of duplexes formed between RNAs or DNAs aregenerally in the order of RNA:RNA>RNA:DNA>DNA:DNA, in solution. Longprobes have better duplex stability with a target, but poorer mismatchdiscrimination than shorter probes (mismatch discrimination refers tothe measured hybridization signal ratio between a perfect match probeand a single base mismatch probe). Shorter probes (e.g., 8-mers)discriminate mismatches very well, but the overall duplex stability islow.

[0081] Altering the thermal stability (T_(m)) of the duplex formedbetween the target and the probe using, e.g., known oligonucleotideanalogues allows for optimization of duplex stability and mismatchdiscrimination. One useful aspect of altering the T_(m) arises from thefact that adenine-thymine (A-T) duplexes have a lower T_(m) thanguanine-cytosine (G-C) duplexes, due in part to the fact that the A-Tduplexes have two hydrogen bonds per base-pair, while the G-C duplexeshave three hydrogen bonds per base pair. In heterogeneousoligonucleotide arrays in which there is a non-uniform distribution ofbases, it is not generally possible to optimize hybridization for eacholigonucleotide probe simultaneously. Thus, in some embodiments, it isdesirable to selectively destabilize G-C duplexes and/or to increase thestability of A-T duplexes. This can be accomplished, e.g., bysubstituting guanine residues in the probes of an array which form G-Cduplexes with hypoxanthine, or by substituting adenine residues inprobes which form A-T duplexes with 2,6 diaminopurine or by usingtetramethyl ammonium chloride (TMACl) in place of NaCl.

[0082] Altered duplex stability conferred by using oligonucleotideanalogue probes can be ascertained by following, e.g., fluorescencesignal intensity of oligonucleotide analogue arrays hybridized with atarget oligonucleotide over time. The data allow optimization ofspecific hybridization conditions at, e.g., room temperature.

[0083] Another way of verifying altered duplex stability is by followingthe signal intensity generated upon hybridization with time. Previousexperiments using DNA targets and DNA chips have shown that signalintensity increases with time, and that the more stable duplexesgenerate higher signal intensities faster than less stable duplexes. Thesignals reach a plateau or “saturate” after a certain amount of time dueto all of the binding sites becoming occupied. These data allow foroptimization of hybridization, and determination of the best conditionsat a specified temperature.

[0084] Methods of optimizing hybridization conditions are well known tothose of skill in the art (see, e.g., Laboratory Techniques inBiochemistry and Molecular Biology, Vol 24: Hybridization With NucleicAcid Probes, P. Tijssen, ed. Elsevier, N.Y., (1993)).

[0085] Signal Detection

[0086] The hybridized nucleic acids can be detected by detecting one ormore labels attached to the target nucleic acids. The labels can beincorporated by any of a number of means well known to those of skill inthe art. However, in a preferred embodiment, the label is incorporatedby labeling the extension primer by carrying out a single base extensionreaction using a fluorescently labeled nucleotide.

[0087] Detectable labels suitable for use in the present inventioninclude any composition detectable by spectroscopic, photochemical,biochemical, immunochemical, electrical, optical, or chemical means.Useful labels in the present invention include high affinity bindinglabels such as biotin for staining with labeled streptavidin conjugate,magnetic beads (e.g., Dynabeads™), fluorescent dyes (e.g., fluorescein,Texas red, rhodamine, green fluorescent protein, and the like),radiolabels (e.g., ³H, ¹²5I, ³5S, ¹4C, or ³2P), enzymes (e.g.,horseradish peroxidase, alkaline phosphatase and others commonly used inan ELISA), epitope labels, and calorimetric labels such as colloidalgold or colored glass or plastic (e.g., polystyrene, polypropylene,latex, etc.) beads. Patents teaching the use of such labels include U.S.Pat. Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437;4,275,149; and 4,366,241.

[0088] Means of detecting such labels are well known to those of skillin the art. Thus, for example, radiolabels can be detected usingphotographic film or scintillation counters, fluorescent markers can bedetected using a photodetector to detect emitted light. Enzymatic labelsare typically detected by providing the enzyme with a substrate anddetecting the reaction product produced by the action of the enzyme onthe substrate, and colorimetric labels are detected by simplyvisualizing the colored label. One method uses colloidal gold label thatcan be detected by measuring scattered light.

[0089] The label can be added to the amplification products prior to, orafter the hybridization. So called “direct labels” are detectable labelsthat are directly attached to or incorporated into the tagged nucleicacids prior to hybridization. In contrast, so called “indirect labels”are joined to the hybrid duplex after hybridization. Often, the indirectlabel is attached to a binding moiety that has been attached to theamplified nucleic acid prior to the hybridization. Thus, for example,the amplified nucleic acid can be biotinylated before the hybridization.After hybridization, an avidin-conjugated fluorophore will bind thebiotin-bearing hybrid duplexes, providing a label that is easilydetected. For a detailed review of methods of labeling nucleic acids anddetecting labeled hybridized nucleic acids see Laboratory Techniques inBiochemistry and Molecular Biology, Vol. 24: Hybridization With NucleicAcid Probes, P. Tijssen, ed. Elsevier, N.Y., (1993)).

[0090] Means of detecting labeled nucleic acids hybridized to the probesof the array are known to those of skill in the art. Thus, for example,where a calorimetric label is used, simple visualization of the label issufficient. Where a radioactive labeled probe is used, detection of theradiation (e.g. with photographic film or a solid state detector) issufficient.

[0091] Detection of target nucleic acids which are labeled with afluorescent label can be accomplished with fluorescence microscopy. Thehybridized array can be excited with a light source at the excitationwavelength of the particular fluorescent label and the resultingfluorescence at the emission wavelength is detected. The excitationlight source can be a laser appropriate for the excitation of thefluorescent label.

[0092] The confocal microscope can be automated with acomputer-controlled stage to automatically scan the entire high densityarray, i.e., to sequentially examine individual probes or adjacentgroups of probes in a systematic manner until all probes have beenexamined. Similarly, the microscope can be equipped with aphototransducer (e.g., a photomultiplier, a solid state array, a CCDcamera, etc.) attached to an automated data acquisition system toautomatically record the fluorescence signal produced by hybridizationto each oligonucleotide probe on the array. Such automated systems aredescribed at length in U.S. Pat. No: 5,143,854, PCT Application 2092/10092, and copending U.S. application Ser. No. 08/195,889, filed onFeb. 10, 1994. Use of laser illumination in conjunction with automatedconfocal microscopy for signal detection permits detection at aresolution of better than about 100 μm, more preferably better thanabout 50 μm, and most preferably better than about 25 μm.

[0093] Two different fluorescent labels can be used in order todistinguish two alleles at each polymorphic locus examined. In such acase, the array can be scanned two times. During the first scan, theexcitation and emission wavelengths are set as required to detect one ofthe two fluorescent labels. For the second scan, the excitation andemission wavelengths are set as required to detect the secondfluorescent label. When the results from both scans are compared, thegenotype identification or allele frequency can be determined.Quantification and Determination of Genotypes The term “quantifying”when used in the context of quantifying hybridization of a nucleic acidsequence or subsequence can refer to absolute or to relativequantification. Absolute quantification can be accomplished by inclusionof known concentration(s) of one or more target nucleic acids (e.g.,control nucleic acids such as Bio B, or known amounts the target nucleicacids themselves) and referencing the hybridization intensity ofunknowns with the known target nucleic acids (e.g., through generationof a standard curve). Alternatively, relative quantification can beaccomplished by comparison of hybridization signals between two or moregenes, or between two or more treatments to quantify the changes inhybridization intensity and, by implication, the frequency of an allele.Relative quantification can also be used to merely detect the presenceor absence of an allele in the target nucleic acids. In one embodiment,for example, the presence or absence of an allelic form of a polymorphiclocus can be determined by measuring the quantity of the labeled tag atthe known location in the array, i.e., on the solid support, of thecorresponding probe.

[0094] A preferred quantifying method is to use a confocal microscopeand fluorescent labels. The GeneChip® system (Affymetrix, Santa Clara,Calif.) is particularly suitable for quantifying the hybridization;however, it will be apparent to those of skill in the art that anysimilar system or other effectively equivalent detection method can alsobe used.

[0095] Methods for evaluating the hybridization results vary with thenature of the specific probes used, as well as the controls. Simplequantification of the fluorescence intensity for each probe can bedetermined. This can be accomplished simply by measuring signal strengthat each location (representing a different probe) on the high densityarray (e.g., where the label is a fluorescent label, detection of thefluorescence intensity produced by a fixed excitation illumination ateach location on the array).

[0096] One of skill in the art, however, will appreciate thathybridization signals will vary in strength with efficiency ofhybridization, the amount of label on the sample nucleic acid and theamount of the particular nucleic acid in the sample. Typically nucleicacids present at very low levels (e.g., <1 pM) will show a very weaksignal. At some low level of concentration, the signal becomes virtuallyindistinguishable from background. In evaluating the hybridization data,a threshold intensity value can be selected below which a signal iscounted as being essentially indistinguishable from background.

[0097] The terms “background” or “background signal intensity” refer tohybridization signals resulting from non-specific binding, or otherinteractions, between the labeled target nucleic acids and components ofthe oligonucleotide array (e.g., the oligonucleotide probes, controlprobes, the array substrate, etc.). Background signals may also beproduced by intrinsic fluorescence of the array components themselves. Asingle background signal can be calculated for the entire array, or adifferent background signal may be calculated for each target nucleicacid. In a preferred embodiment, background is calculated as the averagehybridization signal intensity for the lowest 5% to 10% of the probes inthe array, or, where a different background signal is calculated foreach target allele, for the lowest 5% to 10% of the probes for eachallele. However, where the probes to a particular allele hybridize welland thus appear to be specifically binding to a target sequence, theyshould not be used in a background signal calculation. Alternatively,background may be calculated as the average hybridization signalintensity produced by hybridization to probes that are not complementaryto any sequence found in the sample (e.g., probes directed to nucleicacids of the opposite sense or to genes not found in the sample, such asbacterial genes where the sample is mammalian nucleic acids). Backgroundcan also be calculated as the average signal intensity produced byregions of the array that lack any probes at all. In a preferredembodiment, background signal is reduced by the use of a detergent(e.g., C-TAB) or a blocking reagent (e.g., sperm DNA, cot-i DNA, etc.)during the hybridization to reduce non-specific binding. In aparticularly preferred embodiment, the hybridization is performed in thepresence of about 0.5 mg/ml DNA (e.g., herring sperm DNA). The use ofblocking agents in hybridization is well known to those of skill in theart (see, e.g., Chapter 8 in P. Tijssen, supra).

[0098] The high density array can include mismatch controls. In apreferred embodiment, there is a mismatch control having a centralmismatch for every probe in the array, except the normalizationcontrols. It is expected that after washing in stringent conditions,where a perfect match would be expected to hybridize to the probe, butnot to the mismatch, the signal from the mismatch controls should onlyreflect non-specific binding or the presence in the sample of a nucleicacid that hybridizes with the mismatch. Where both the probe in questionand its corresponding mismatch control show high signals, or themismatch shows a higher signal than its corresponding test probe, thereis a problem with the hybridization and the signal from those probes isignored. For a given polymorphic locus, the difference in hybridizationsignal intensity (I_(allele1)−I_(allele2)) between an allele-specificprobe (perfect match probe) for a first allele and the correspondingprobe for a second allele or an average of several other alleles (orother mismatch control probe) is a measure of the presence of orconcentration of the first allele. Thus, in a preferred embodiment, thesignal of the mismatch probe is subtracted from the signal for itscorresponding test probe to provide a measure of the signal due tospecific binding of the test probe.

[0099] The concentration of a particular sequence can then be determinedby measuring the signal intensity of each of the probes that bindspecifically to that gene and normalizing to the normalization controls.Where the signal from the probes is greater than the mismatch, themismatch is subtracted. Where the mismatch intensity is equal to orgreater than its corresponding test probe, the signal is ignored (i.e.,the signal cannot be evaluated).

[0100] For each polymorphic locus analyzed, the genotype can beunambiguously determined by comparing the hybridization patternsobtained to the known locations of the allele-specific probes. Whenanalyzing a DNA sample from a single individual, significant detectionof hybridization to a probe indicates the presence of the correspondingallelic form in the genome of the individual. Marginal detection ofhybridization, indicated by an intermediate positive result (e.g., lessthan 1%, or from 1-5%, or from 1-10%, or from 2-10%, or from 5-10%, orfrom 1-20%, or from 2-20%, or from 5-20%, or from 10-20% of the averageof all positive hybridization results obtained for the entire array) mayindicate either cross-hybridization or cross-amplification.

[0101] Further procedures for data analysis are disclosed in U.S.application Ser. No. 08/772,376, previously incorporated for allpurposes.

[0102] Determination of Allele Frequency

[0103] The “allele frequency” is the frequency with which a selectedallelic form of a gene exists within a population or selected group oforganisms. Allele frequency is determined from the relative intensity ofhybridization to probes. The frequency of a selected allelic form can bequantified as the detected number of copies of the selected alleledivided by the total number of alleles of the gene possessed by theindividuals tested. Statistical methods are available to determinewhether the number of individuals tested is representative of a givenpopulation. The ratio of different allelic forms in a population canalso be determined using the methods described above. For example, ifthe DNA sample analyzed contains a mixture of DNA from a population ofindividuals, then the ratio of different allelic forms in the populationis measured directly as the ratio of the relative intensities of thelabel which hybridizes to the probes corresponding to those allelicforms.

[0104] It is understood that the examples and embodiments describedherein are for illustrative purposes only and that various modificationsor changes will be suggested to persons skilled in the art and are to beincluded within the spirit and purview of this application and scope ofthe appended claims. All publications, patents, and patent applicationscited herein are hereby incorporated by reference for all purposes.

We claim:
 1. A method to aid in detecting a selected allele of a gene ina sample, comprising the steps of: amplifying a region of DNA in thesample, wherein the region comprises a polymorphic locus of the selectedallele of the gene, to form an amplified DNA product; labeling anextension primer in the presence of the amplified DNA product, whereinthe amplified DNA product serves as a template for the step of labeling,wherein the extension primer comprises a 3′ portion which iscomplementary to the amplified DNA product and a 5′ portion which is notcomplementary to the amplified DNA product, wherein the extension primerterminates in a 3′ nucleotide at the polymorphic locus of the selectedallele, whereby at least one labeled nucleotide is coupled to the 3′terminal nucleotide of the extension primer to form a labeled extensionprimer; and hybridizing the labeled extension primer to a probe on asolid support, wherein at least a portion of the probe is complementaryto the 5′ portion of the extension primer.
 2. The method of claim 1additionally comprising the step of: detecting the label on the solidsupport, wherein the presence of the label on the solid supportindicates the presence of the selected allele in the sample.
 3. Themethod of claim 1 wherein the label is fluorescent.
 4. The method ofclaim 1 wherein the label is radioactive.
 5. The method of claim 1wherein the label is enzymatic.
 6. The method of claim 1 wherein thelabel is epitopic.
 7. The method of claim 1 wherein the solid support isbeads.
 8. The method of claim 1 wherein the solid support is amicrotiter dish.
 9. The method of claim 1 wherein the DNA is genomic.10. The method of claim 1 wherein the DNA is cDNA.
 11. The method ofclaim 1 wherein the DNA is mitochondrial.
 12. The method of claim 1wherein the DNA is viral.
 13. The method of claim 1 wherein the DNA inthe sample was obtained from more than one individual.
 14. The method ofclaim 1 wherein labeled extension products of a pluarlity of samplesfrom one or more individuals are mixed prior to the step of hybridizing.15. The method of claim 1 wherein extension primers complementary to twoor more selected alleles of the gene are used in the step of labeling.16. The method of claim 15 wherein quantities of label at knownlocations on the solid support are compared and a ratio of nucleotidesat the polymorphic locus in the sample is determined.
 17. The method ofclaim 16 wherein the sample comprises DNA from two or more individuals.18. The method of claim 1 wherein primers complementary to selectedalleles at two or more polymorphic loci are used in the steps ofamplifying and labeling.
 19. The method of claim 18 wherein quantitiesof label at known locations on the solid support are compared and aratio of nucleotides at each polymorphic locus is determined.
 20. Themethod of claim 19 wherein the sample comprises DNA from two or moreindividuals.
 21. A method to aid in detecting a selected allele of agene in a sample, comprising the steps of: amplifying a region of DNA inthe sample, wherein the region comprises a polymorphic locus of theselected allele of the gene using an amplification primer to form anamplified DNA product, wherein the primer terminates in a 3′ nucleotideat the polymorphic locus of the selected allele; labeling an extensionprimer in the presence of the amplified DNA product, wherein theamplified DNA product serves as a template for the step of labeling,wherein the extension primer comprises a 3′ portion which iscomplementary to the amplified DNA product and a 5′ portion which is notcomplementary to the amplified DNA product, wherein the extension primerterminates in a 3′ nucleotide at the polymorphic locus of the selectedallele, whereby at least one labeled nucleotide is coupled to the 3′terminal nucleotide of the extension primer to form a labeled extensionprimer; and hybridizing the labeled extension primer to a probe on asolid support, wherein at least a portion of the probe is complementaryto the 5′ portion of the extension primer.
 22. The method of claim 21additionally comprising the step of: detecting the label on the solidsupport, wherein the presence of the label on the solid supportindicates the presence of the selected allele in the sample.
 23. Themethod of claim 21 wherein the label is fluorescent.
 24. The method ofclaim 21 wherein the label is enzymatic.
 25. The method of claim 21wherein the label is epitopic.
 26. The method of claim 21 wherein thelabel is radioactive.
 27. The method of claim 21 wherein the solidsupport is beads.
 28. The method of claim 21 wherein the solid supportis a microtiter dish.
 29. The method of claim 21 wherein the DNA isgenomic.
 30. The method of claim 21 wherein the DNA is cDNA.
 31. Themethod of claim 21 wherein the DNA is mitochondrial.
 32. The method ofclaim 21 wherein the DNA is viral.
 33. The method of claim 21 whereinthe DNA in the sample was obtained from more than one individual. 34.The method of claim 21 wherein labeled extension products of a pluralityof samples from more than one individual are mixed prior to the step ofhybridizing.
 35. The method of claim 21 wherein extension primerscomplementary to two or more selected alleles of the gene are used inthe step of labeling.
 36. The method of claim 35 wherein quantities oflabel at known locations on the solid support are compared and a ratioof nucleotides at the polymorphic locus in the sample is determined. 37.The method of claim 36 wherein the sample comprises DNA from two or moreindividuals.
 38. The method of claim 21, wherein primers complementaryto selected alleles at two or more polymorphic loci are used in thesteps of amplifying and labeling.
 39. The method of claim 38 whereinquantities of label at known locations on the solid support are comparedand a ratio of nucleotides at the polymorphic locus in the sample isdetermined.
 40. The method of claim 39 wherein the sample comprises DNAfrom two or more individuals.
 41. A kit comprising in a single containera set of primers for use in detecting a selected allele of a gene, saidset comprising: a pair of amplification primers which amplify a regionof the gene comprising a polymorphic locus; and an extension primer,wherein the extension primer terminates in a 3′ nucleotide which is atthe polymorphic locus of the selected allele, wherein a 3′ portion ofthe extension primer is complementary to the selected allele, wherein a5′ portion of the extension primer is complementary to all or a portionof a probe on a solid support but not complementary to the amplifiedregion of the gene.
 42. The kit of claim 41 which comprises two or moreextension primers, wherein the 3′ portion of each extension primer iscomplementary to a different allele of the gene.
 43. The kit of claim 41which comprises two or more sets of primers, wherein each amplificationprimer pair is complementary to a different gene.
 44. The kit of claim41 further comprising one or more solid supports comprising one or moreprobes, wherein all or a portion of said one or more probes iscomplementary to the 5′ portion of an extension primer.
 45. A kitcomprising in a single container a set of primers for use in detectingan allele of a gene, said set comprising: a pair of amplificationprimers which specifically amplify a selected allele, wherein the pairof primers comprises a first and a second primer, wherein the first andsecond primers are complementary to opposite strands of the selectedallele, wherein the first primer terminates in a 3′ nucleotide which isat a polymorphic locus of the selected allele; and an extension primer,wherein a 3′ portion of the extension primer is complementary to theselected allele and terminates in a 3′ nucleotide which is at thepolymorphic locus of the selected allele, wherein a 5′ portion of theextension primer is complementary to a probe on a solid support but notcomplementary to the amplified region of the DNA target.
 46. The kit ofclaim 45 which comprises two or more sets of primers, wherein the firstprimer of each amplification primer pair is complementary to a differentallele.
 47. The kit of claim 45 further comprising one or more solidsupports comprising one or more probes, wherein all or a portion of theone or more probes is complementary to a 5′ portion of an extensionprimer of a selected allele.
 48. A method to aid in detecting a selectedallele of a gene in a sample, comprising the steps of: labeling anextension primer in the presence of DNA in a sample which comprises agene, wherein the DNA serves as a template for the step of labeling,wherein the extension primer comprises a 3′ portion which iscomplementary to the DNA and a 5′ portion which is not complementary tothe DNA, wherein the extension primer terminates in a 3′ nucleotide atthe polymorphic locus of the selected allele, whereby at least onelabeled nucleotide is coupled to the 3′ terminal nucleotide of theextension primer to form a labeled extension primer; and hybridizing thelabeled extension primer to a probe on a solid support, wherein at leasta portion of the probe is complementary to the 5′ portion of theextension primer.
 49. The method of claim 48 additionally comprising thestep of: detecting the label on the solid support, wherein the presenceof the label on the solid support indicates the presence of the selectedallele in the sample.
 50. The method of claim 48 wherein the label isfluorescent.
 51. The method of claim 48 wherein the label isradioactive.
 52. The method of claim 48 wherein the label is enzymatic.53. The method of claim 48 wherein the label is epitopic.
 54. The methodof claim 48 wherein the DNA in the sample was obtained from more thanone individual.
 55. The method of claim 48 wherein labeled extensionproducts of samples from more than one individual are mixed prior to thestep of hybridizing.
 56. The method of claim 48 wherein extensionprimers complementary to two or more selected alleles of the gene areused in the step of labeling.
 57. The method of claim 56 whereinquantities of label at known locations on the solid support are comparedand a ratio of nucleotides at the polymorphic locus in the sample isdetermined.
 58. The method of claim 57 wherein the sample comprises DNAfrom two or more individuals.
 59. The method of claim 48 wherein primerscomplementary to selected alleles at two or more polymorphic loci areused in the step of labeling.
 60. The method of claim 59 whereinquantities of label at known locations on the solid support are comparedand a ratio of nucleotides at the polymorphic locus in the sample isdetermined.
 61. The method of claim 60 wherein the sample comprises DNAfrom two or more individuals.
 62. The method of claim 48 wherein thesolid support is beads.
 63. The method of claim 48 wherein the solidsupport is a microtiter dish.
 64. The method of claim 48 wherein the DNAis genomic.
 65. The method of claim 48 wherein the DNA is cDNA.
 66. Themethod of claim 48 wherein the DNA is mitochondrial.
 67. The method ofclaim 48 wherein the DNA is viral.